AITopics | adversarial transferability

Collaborating Authors

adversarial transferability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Building Model/Prompt-Transferable Attackers against Large Vision-Language Models

Neural Information Processing SystemsJun-23-2026, 04:21:51 GMT

Although Large Vision-Language Models (LVLMs) exhibit impressive multimodal capabilities, their vulnerability to adversarial examples has raised serious security concerns. Existing LVLM attackers simply optimize adversarial images that easily overfit a certain model/prompt, making them ineffective once they are transferred to attack a different model/prompt. Motivated by this research gap, this paper aims to develop a more powerful attack that is transferable to black-box LVLM models of different structures and task-aware prompts of different semantics. Specifically, we introduce a new perspective of information theory to investigate LVLMs' transferable characteristics by exploring the relative dependence between outputs of the LVLM model and input adversarial samples. Our empirical observations suggest that enlarging/decreasing the mutual information between outputs and the disentangled adversarial/benign patterns of input images helps to generate more agnostic perturbations for misleading LVLMs' perception with better transferability. In particular, we formulate the complicated calculation of information gain as an estimation problem and incorporate such informative constraints into the adversarial learning process. Extensive experiments on various LVLM models/prompts demonstrate our significant transfer-attack performance.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability

Neural Information Processing SystemsJun-18-2026, 07:58:01 GMT

Vision Transformers (ViTs) have demonstrated impressive performance across a range of applications, including many safety-critical tasks. Many previous studies have observed that adversarial examples crafted on ViTs exhibit higher transferability than those crafted on CNNs, indicating that ViTs contain structural characteristics favorable for transferable attacks. In this work, we take a further step to deeply investigate the role of computational redundancy brought by its unique characteristics in ViTs and its impact on adversarial transferability. Specifically, we identify two forms of redundancy, including the data-level and model-level, that can be harnessed to amplify attack effectiveness. Building on this insight, we design a suite of techniques, including attention sparsity manipulation, attention head permutation, clean token regularization, ghost MoE diversification, and learning to robustify before the attack. A dynamic online learning strategy is also proposed to fully leverage these operations to enhance the adversarial transferability. Extensive experiments on the ImageNet-1k dataset validate the effectiveness of our approach, showing that our methods significantly outperform existing baselines in both transferability and generality across diverse model architectures, including different variants of ViTs and mainstream Vision Large Language Models (VLLMs).

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (0.70)
Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Consensus-Robust Transfer Attacks via Parameter and Representation Perturbations

Neural Information Processing SystemsJun-17-2026, 15:41:48 GMT

Adversarial examples crafted on one model often exhibit poor transferability to others, hindering their effectiveness in black-box settings. This limitation arises from two key factors: (i) decision-boundary variation across models and (ii) representation drift in feature space. We address these challenges through a new perspective that frames transferability for untargeted attacks as a consensus-robust optimization problem: adversarial perturbations should remain effective across a neighborhood of plausible target models. To model this uncertainty, we introduce two complementary perturbation channels: a parameter channel, capturing boundary shifts via weight perturbations, and a representation channel, addressing feature drift via stochastic blending of clean and adversarial activations. We then propose CORTA (COnsensus-Robust Transfer Attack), a lightweight attack instantiated from this robust formulation using two first-order strategies: (i) sensitivity regularization based on the squared Frobenius norm of logits' Jacobian with respect to weights, and (ii) Monte Carlo sampling for blended feature representations. Our theoretical analysis provides a certified lower bound linking these approximations to the robust objective. Extensive experiments on CIFAR-100 and ImageNet show that CORTA significantly outperforms state-of-the-art transfer-based methods-- including ensemble approaches--across CNN and Vision Transformer targets. Notably, CORTA achieves a 19.1 percentage-point gain in transfer success rate over the best prior method while using only a single surrogate model.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America (0.93)
Asia > China (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.47)
Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Boosting Adversarial Transferability with Spatial Adversarial Alignment

Neural Information Processing SystemsJun-15-2026, 17:57:49 GMT

Deep neural networks are vulnerable to adversarial examples that exhibit transferability across various models. Numerous approaches are proposed to enhance the transferability of adversarial examples, including advanced optimization, data augmentation, and model modifications. However, these methods still show limited transferability, particularly in cross-architecture scenarios, such as from CNN to ViT. To achieve high transferability, we propose a technique termed Spatial Adversarial Alignment (SAA), which employs an alignment loss and leverages a witness model to fine-tune the surrogate model. Specifically, SAA consists of two key parts: spatial-aware alignment and adversarial-aware alignment.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability

Neural Information Processing SystemsJun-12-2026, 17:15:56 GMT

Vision Transformers (ViTs) have demonstrated impressive performance across a range of applications, including many safety-critical tasks. Many previous studies have observed that adversarial examples crafted on ViTs exhibit higher transferability than those crafted on CNNs, indicating that ViTs contain structural characteristics favorable for transferable attacks. In this work, we take a further step to deeply investigate the role of computational redundancy brought by its unique characteristics in ViTs and its impact on adversarial transferability. Specifically, we identify two forms of redundancy, including the data-level and model-level, that can be harnessed to amplify attack effectiveness. Building on this insight, we design a suite of techniques, including attention sparsity manipulation, attention head permutation, clean token regularization, ghost MoE diversification, and learn to robustify before the attack. A dynamic online learning strategy is also proposed to fully leverage these operations to enhance the adversarial transferability. Extensive experiments on the ImageNet-1k dataset validate the effectiveness of our approach, showing that our methods significantly outperform existing baselines in both transferability and generality across diverse model architectures, including different variants of ViTs and mainstream Vision Large Language Models (VLLMs).

artificial intelligence, natural language, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.60)

Add feedback

Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs

Yuan, Leitao, Mao, Qinghua, Liu, Daizong, Wang, Kun, Wang, Wenjie, Teng, Yan, Shao, Jing, Liu, Dongrui

arXiv.org Machine LearningMay-22-2026

Multimodal large language models (MLLMs) remain vulnerable to transfer-based targeted attacks, where perturbations optimized on open-source surrogate encoders can generalize to closed-source MLLMs. A key challenge for improving adversarial transferability is to effectively capture the intrinsic visual focus shared across different models, such that perturbations align with transferable semantic cues rather than surrogate-specific behaviors. However, existing methods suffer from spatial-domain feature redundancy and surrogate-specific gradient signals, thereby hindering cross-model transferability. In this paper, we propose FRA-Attack, which addresses both challenges from a unified frequency-domain regularization perspective. For feature alignment, a high-pass DCT objective on patch features suppresses redundant global structures and concentrates the loss on the high-frequency band that carries the MLLMs' intrinsic visual focus. For gradient optimization, we introduce Frequency-domain Gradient Regularization (FGR), a \textit{model-agnostic} low-pass regularizer that modulates the surrogate gradient using only the geometric frequency coordinate, \textit{i.e.}, no surrogate-derived statistic is involved, so that FGR is model-agnostic by construction, removing surrogate-specific high-frequency artifacts while preserving transferable low-frequency directions. Together, the two components form a unified frequency-domain treatment of transferability. Extensive experiments on $15$ flagship MLLMs across $7$ vendors show that FRA-Attack achieves superior cross-model transferability, particularly with state-of-the-art performance on GPT-5.4, Claude-Opus-4.6 and Gemini-3-flash.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2605.21541

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Rethinking the Backward Propagation for Adversarial Transferability

Neural Information Processing SystemsApr-24-2026, 08:56:35 GMT

Transfer-based attacks generate adversarial examples on the surrogate model, which can mislead other black-box models without access, making it promising to attack real-world applications. Recently, several works have been proposed to boost adversarial transferability, in which the surrogate model is usually overlooked. In this work, we identify that non-linear layers (e.g.

artificial intelligence, machine learning, transferability, (13 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Security & Privacy (0.95)
Information Technology > Artificial Intelligence > Vision (0.95)

Add feedback

Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability

Neural Information Processing SystemsApr-24-2026, 05:57:59 GMT

The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class instead of low sample density regions. Therefore, in the target setting, adding perturbations towards HSDR of the target class is more effective in improving transferability. However, density estimation is challenging in high-dimensional scenarios.

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Boosting Adversarial Transferability by Achieving Flat Local Maxima

Neural Information Processing SystemsMar-22-2026, 04:40:07 GMT

Transfer-based attack adopts the adversarial examples generated on the surrogate model to attack various models, making it applicable in the physical world and attracting increasing interest. Recently, various adversarial attacks have emerged to boost adversarial transferability from different perspectives. In this work, inspired by the observation that flat local minima are correlated with good generalization, we assume and empirically validate that adversarial examples at a flat local region tend to have good transferability by introducing a penalized gradient norm to the original loss function. Since directly optimizing the gradient regularization norm is computationally expensive and intractable for generating adversarial examples, we propose an approximation optimization method to simplify the gradient update of the objective function. Specifically, we randomly sample an example and adopt a first-order procedure to approximate the curvature of the second-order Hessian matrix, which makes computing more efficient by interpolating two Jacobian matrices. Meanwhile, in order to obtain a more stable gradient direction, we randomly sample multiple examples and average the gradients of these examples to reduce the variance due to random sampling during the iterative process. Extensive experimental results on the ImageNet-compatible dataset show that the proposed method can generate adversarial examples at flat local regions, and significantly improve the adversarial transferability on either normally trained models or adversarially trained models than the state-of-the-art attacks.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Boosting Adversarial Transferability by Achieving Flat Local Maxima

Neural Information Processing SystemsFeb-17-2026, 12:48:47 GMT

Specifically, we randomly sample an example and adopt a first-order procedure to approximate the Hessian/vector product, which makes computing more efficient by interpolating two neighboring gradients.

adversarial example, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country: